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In this paper we present a complete iteration complexity analysis of inexact first order La¬ 
grangian and penalty methods for solving cone constrained convex problems that have or 
may not have optimal Lagrange multipliers that close the duality gap. We first assume the 
existence of optimal Lagrange multipliers and study primal-dual first order methods based 
on inexact information and augmented Lagrangian smoothing or Nesterov type smoothing. 

For inexact (fast) gradient augmented Lagrangian methods we derive a total computational 
complexity of O ( 1 ) projections onto a simple primal set in order to attain an e—optimal 
solution of the conic convex problem. For the inexact fast gradient method combined with 

Nesterov type smoothing we derive computational complexity O ^- 372 ^ projections onto the 

same set. Then, we assume that optimal Lagrange multipliers for the cone constrained convex 
problem might not exist, and analyze the fast gradient method for solving penalty reformu¬ 
lations of the problem. For the fast gradient method combined with penalty framework we 

also derive a total computational complexity of O (“ 372 ) projections onto a simple primal 
set to attain an e—optimal solution for the original problem. 

Keywords: conic convex problems, smooth (augmented) dual functions, penalty functions, 
(augmented) dual first order methods, penalty fast gradient methods, approximate primal 
solution, computational complexity. 
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1. Introduction 

Many recent engineering and economical applications can be posed as large-scale conic 
convex problems and thus the interest for scalable algorithms with inexpensive itera¬ 
tions is continuously increasing. For instance, in the recent optimization literature, first 
order methods gained much attention since they present cheap iterations and are usually 
adequate for large-scale convex setting. In the constrained case, when there are conic com¬ 
plicated constraints, many first order algorithms are combined with duality or penalty 
strategies. For example, in @] various smooth and nonsmooth formulations are provided 
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for cone programming, and through application of first order methods (e.g. fast gradi¬ 
ent or mirror descent) on the corresponding reformulations of the optimality conditions 
as optimization problems, an e-optimal solution is obtained in O (^) projections onto 
a simple primal set. For conic constrained convex problems, quadratic penalty strate¬ 
gies are combined with fast gradient method in [9]. Under the assumptions of smooth 
objective function and existence of a finite optimal Lagrange multiplier, the first order 
quadratic penalty method in |9] requires (9(4) fast gradient iterations. Moreover, using a 
regularization of the original problem with a strongly convex term, this method requires 
0( j log (7) ) fast gradient iterations. Recently, other first order augmented Lagrangian 
methods are presented in 0 , 0,0 and computational complexity estimates of order 
O (^) are obtained for smooth problems with bounded optimal Lagrange multipliers. 
First order methods are also combined with duality and Nesterov type smoothing in 
Hi, 0, 0, I 23 I [Ti . [2?J and convergence rates of order (9 (^) in terms of dual gradient 
evaluations are derived. Another interesting approach relies on reformulation of conic 
constrained programming problems into a monotone variational inequality and then de¬ 
signing various algorithms for solving these inequalities. This approach can be found in 
[181 . 120( 1 ■ where different primal-dual methods are devised for solving the variational in¬ 
equality under the boundedness assumption of the primal and dual feasible sets. Recently, 
the boundedness condition has been eliminated in m 

Motivation. However, the following issues can be identified in the existing literature: 

(a) Most of the existing papers on dual first order methods combined with smoothing 
techniques derive rate of convergence results in terms of outer iterations (number of dual 
gradient evaluations). However, we will show (see e.g., Theorem 3.4) that one might 
choose an appropriate value of the smoothing parameter such that after a single outer 
iteration an e—solution can be obtained. Thus, convergence rates in terms of outer iter¬ 
ations are not relevant in this case and it is natural to analyze the overall complexity of 
these methods that also take into account the inner iterations (e.g., number of projections 
onto the primal feasible set or number of matrix-vector multiplications). 

(b) Moreover, from our knowledge, there is no complete analysis in the optimization 
literature regarding the overall complexity of inexact dual first order methods based on 
augmented Lagrangian smoothing and Nesterov smoothing and clarifying which smooth¬ 
ing approach has a better behavior. 

(c) Finally, all the papers on Lagrangian and penalty methods mentioned above make the 
strong assumption that there exists an optimal Lagrange multiplier for the primal convex 
problem that closes the duality gap. This property is usually guaranteed through a Slater 
type condition, which in the large-scale setting is very difficult to check computationally 
or even might not hold. Recently, Nesterov developed in [22] subgradient methods for 
nonsmooth convex problems with functional constraints without this assumption on the 
existence of an optimal Lagrange multiplier and proved that an e-optimal point can be 
attained after O (4) subgradient evaluations for either the objective function or for a 
functional constraint. Nesterov also asks in [22] whether it is possible to improve this 
convergence rate result under additional smooth assumptions on the objective function 
and functional constraints. 


Contributions. These issues motivate our work here. In this paper we present a complete 
iteration complexity analysis of inexact first order Lagrangian and penalty methods for 
solving cone constrained convex problems that have or may not have optimal Lagrange 
multipliers that close the duality gap. In the first part of our paper we assume the ex¬ 
istence of optimal Lagrange multipliers and we derive overall complexity of primal-dual 
first order methods based on the inexact oracle framework [h] and augmented Lagrangian 
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smoothing [2^] or Nesterov type smoothing 0,0. Although we obtain in some cases 
similar complexity results with those found in the literature, our analysis based on the 
inexact oracle framework is simpler, intuitive and more elegant, opening various possibil¬ 
ities for extensions to more complex optimization models. Moreover, in some optimality 
criteria our computational complexities are significantly better than those found in the 
existing literature. These better complexities are achieved through the new first order 
inexact oracles for augmented Lagrangian (Nesterov) smoothing derived in Theorem 3.2 
(Theorem 3 . 11 ) that improve substantially those in [h]. In the second part we assume 
that the conic constrained convex problem might not admit an optimal Lagrange mul¬ 
tiplier. In this case, we combine the fast gradient method with penalty strategies and 
derive computational complexity certifications for such methods which consistently im¬ 
proves those given in [ 22 ] for the nonsmooth case. Thus, our results cover the particular 
case when the Slater condition does not hold or it is difficult to check for large-scale 
conic convex problems and answer positively to Nesterov’s question. To the best of our 
knowledge, this paper present one of the first computational complexity results for first 
order penalty methods for convex problems when optimal Lagrange multipliers do not 
exist. More explicitly, our contributions are: 

(i) First, we assume that we have optimal Lagrange multipliers that close the duality 
gap for the cone constrained convex problem with simple or smooth objective function. 
We provide new computational complexity results on the dual first order augmented 
Lagrangian methods, where the main complexity bounds show that, in order to obtain 
an e—optimal solution for the original problem, the inexact (fast) gradient augmented 
Lagrangian algorithms have to perform O (7) total projections onto the simple primal 
feasible set and feasible cone. 

(ii) We combine in a novel fashion Nesterov smoothing technique and inexact fast gradi¬ 
ent method for solving cone constrained optimization problems with possibly unbounded 
feasible cone. We show that, in order to obtain an e-optimal solution, fast gradient method 
with inexact information performs O (^372 l°g (7)) projections onto the simple primal fea¬ 
sible set and O ( 7) projections onto the feasible cone. Thus, our work shows that inexact 
fast gradient method based on Nesterov smoothing has worse overall complexity than 
the one based on augmented Lagrangian smoothing. 

(in) Then, we eliminate the assumption that there exists some optimal Lagrange mul¬ 
tiplier for the cone constrained convex problem and we analyze the computational com¬ 
plexity of fast gradient penalty methods. If the objective function is smooth, then we 
prove that in order to obtain an e-optimal solution for the original problem we need 
to perform O (^372) total projections onto the simple primal feasible set. Through an 
example, we also show that our bounds are tight. 

Notations. We denote 8 = 1 U {+00}. For u, v € M n , we consider scalar product 
(u,v) = u T v and Euclidean norm ||u|| = Vu T u. Further, [u]u denotes the projection 
of u onto nonempty closed convex set U and dist;/(u) = ||u — Hc/|| its distance to U. 
Moreover, we use notation J\fjj(u) for the normal cone of the convex set U at u G U 
defined by J\Tu(u ) = {t € : (t,u — v) > 0 Vu € U}. We also use notation B r (x) = 

{z € M n | ||z — x|| < r}. For a matrix G € M mx?l we use ||(7|| f or the spectral norm. 
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2. Problem formulation 

In this paper we consider the following cone constrained convex optimization problem: 

f* = min f(u) s.t. Gu + g G /C, (1) 

u£U 

where / : R n —>• R is a proper, closed, convex function, JJ C dom/ is a nonempty closed, 
convex set, G € R mxn an d c R m is a nonempty, closed, convex cone, having its polar 
cone JC* = {v € R m : (u, k) <0 Vk € /C}. We denote U* C R n the optimal set of 
the above problem. Note that our formulation and results can be extended to general 
norrned vector spaces. The following assumptions are valid throughout the paper: 

Assumption 2.1 Objective function f is strongly convex with constant at > 0: 

(T f(X[ 1 — Cl) 

/ (era + (1 — a)v) < af(u) + (1 — a)f(v) ---||u — t|| 2 Vit, v£ domf, a £ [0, 1]. 

Note that Assumption 12.11 with <7/ = 0 is equivalent with convexity of function /. 

Assumption 2.2 (i) The feasible set U and the cones K and K* are closed, convex 

and simple (e.g., the projection onto these sets can be obtained in closed form). 

(u) The convex set U is bounded, i.e. exists Du < oo such that max ||u — i;|| < Du- 

u,v£U 

Note that these assumptions are standard in the context of first order Lagrangian and 
penalty methods for conic convex programming, see e.g. 0, S, ULO. HI, .19;, 0. Further, 
convex function h : R n —>• M, with U C dorn/i, is called simple if the optimal solution of 
the following problem can be efficiently obtained (e.g., in closed form): 

min h(u) + — \\u — z II 2 Vu > 0 and z € M n . 
ueu v 2 p, 

In this paper we assume that the convex objective function / is either simple or has 
Lipschitz continuous gradient with constant Lj > 0 and dom/ = i.e.: 

0 < f{y) - (f(x) + (V/(x), y — x)) < ^-\\x - y\\ 2 Vx,y £ K n . 

Our goal is to find an approximate solution for the optimization problem ([T|). Thus, we 
introduce the following definition used in the rest of the paper: 

Definition 1 Given the desired accuracy e > 0, the primal point u e € U is an e-optimal 
solution for the cone constrained convex problem © if it satisfies: 

|/(u £ )-/*|<e and dist /c(Gu e +g)<e. 


2.1 A framework for inexact first order methods 

Since the main algorithm in this paper is the Nesterov fast gradient method [ 21 ], we 
further introduce an inexact algorithmic framework based on the method in 0 , i|which 
will be subsequently called in various ways. Therefore, consider the following general 
convex constrained optimization problem with composite objective function: 

F* = min F(z) (= <f(z) + ip(z)) , (2) 

z&Q 
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where Q C R n is a simple, convex set, <j) : W 1 —>■ R is a convex function with Lipschitz 
continuous gradient of constant > 0 and ip : R n —>• R is a simple, closed, convex 
function. Using the definition from (g.] , given 5 > 0 and L > 0, we assume that the 
smooth function (p is equipped with a first order inexact (5, L)-oracle, i.e. for any y € Q 
we can compute an approximate function value 4>s,l(v ) and an approximate gradient 
Vcps,L(y) such that the following inequalities hold: 

o < fi { x ) - (4>8,L(y) + ( V < ps , L ( y),x - y )) < ^\\x - y \\ 2 + 6 Vx € Q. (3) 

Next, we introduce the Inexact Composite Fast Gradient (ICFG) method for solving the 
composite optimization problem (O using approximate function values and gradients 
(^i(?/),V^L(j/)) satisfying the first order (5, L)-oracle given in (j3]) : 


Algorithm ICFG ( <p,ip,5,L ) 

Give z° = w 1 G R n and 6\ = 1. For k > 1 do: 

(1) Compute the pair satisfying Q. Update: 

(2) z k = argmin(Vc ps,L(w k ), z — w k ) + k\\z — w k || 2 + ip(z) 

zGQ 

( 3 ) w k+1 =z k + ^}(z k - z k - x ) 

(4) If a stopping criterion holds, then STOP and return: ( z k ,w k ). 


Note that if we update O^+i = 1+v ^ +4g - for all k > 1 and additionally we consider <5 = 0 
and L = L^, then we recover the well-known FISTA scheme which has been analyzed for 
the first time in [ 2 ] and subsequently extended in different variants in [ 21 ], ,26]. On the 
other hand, if we take 0^ = 1 for all k > 1 and 5 = (k then z k = tc fc+1 and we recover the 
ISTA scheme also developed in Q and extended in 12, 17. 21], Using the same reasoning 


as m |d|j, we provide in the next theorem the rate of convergence of Algorithm ICFG for 
composite optimization problem (j2j) endowed with a first order inexact (<5, L)-oracle ([3]). 
First, let us denote by z* an optimal solution of the composite convex problem (]2j) . 


Theorem 2.3 f£, Let sequences (z k ,w k )k>o be generated by Algorithm ICFG 

(<p,ip,5,L) for solving composite problem (J2J) endowed with a first order inexact ( 5,L )- 

oracle. Then, we have the following convergence rates in terms of function values: 

k -1 

(i) If we define the average sequence z k = z l+1 and 9k = 1 for all k > 1, then z k 

i=0 

has the following sublinear convergence rate in terms of function values: 


F(z k ) — F* < 


L\\z° - z* 


2k 


+ 5 . 


(ii) If we update 6k+i = 1+ ^ 2 +46> - for all k > 1, then the last iterate z k has the following 
sublinear convergence rate in terms of function values: 


F{z k ) — F* < 


2L\\z°-z 

(k + iy 


* 112 


+ kS. 
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3. Inexact first order Lagrangian methods 

In this section we analyze the computational complexity of inexact first order Lagrangian 
methods for solving the cone constrained convex problem 0. Since we use the dual 
framework, we require the following standard assumption for dual algorithms, valid only 
in this Section 0 

Assumption 3.1 There exists a Lagrange multiplier x* € 1C* for the conic convex 
problem CD that closes the duality gap. 

Assumption 13.11 implies the existence of a bounded optimal Lagrange multiplier, that is 
||a:*|| < oo, and it holds for (0) whenever a Slater type condition is valid, i.e. there exists 
u € relint (U) such that Gu + g £ relint(/C). 


3.1 Preliminaries 


The strongly convex case, i.e. when the objective function / in problem (0) satisfies 
Assumption 12.11 with aj > 0, has been extensively studied in the literature, see e.g. 
13, ms, 1 2711. Thus, in the rest of our paper, unless it is explicitly stated, we assume that 
the function / is convex, i.e. it satisfies Assumption [20 with Of = 0. In the general convex 
the dual function, denoted d, is nonsmooth, and thus dual first order methods, 


case, 


such as Algorithm ICFG, cannot be applied. In order to be able to apply dual first 
order algorithms, our approach relies on the combination between smoothing techniques 
and duality. First, we introduce some notations. We note that the problem 0) can be 
reformulated equivalently as: 


min f{u) s.t. u € U, s € /C, Gu + g = s. 


(4) 


Thus, the Lagrangian and the dual function of the convex problem (0) are given by: 

C(u, s, x) = f(u) + (x, Gu + g — s) and d(x) = min C(u,s,x). 

u£U,s£lC 

Assumption 13.11 states that there exists a Lagrange multiplier x* € JC* such that f* = 
d(x*) and thus the convex problem 0 is equivalent with solving the dual formulation: 


f* = max d(x). 

a;GR m V ' 


(5) 


We denote with X* the set of optimal solutions of the dual problem 0. Various dual 
subgradient schemes have been developed for solving 0 with e accuracy, with overall 
complexities of order O (p-) 0, E3]. However, under additional mild assumptions, we 
aim in this paper at improving the iteration complexity required for solving the conic 
optimization problem 0 using the dual formulation. First, we rewrite the dual function 
d in a novel way as a composite function: 


d(x) = min [f(u) + (x, Gu + g)] + min (—s, x) = du(x) + djc(x). (6) 

u&U s£K, 


du(x) 


dtc(x) 
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The function djc(x) is the support function of the cone 1C and, by the definition of the 
polar cone 1C*, also represents the indicator function of 1C*. From our knowledge there 
are two widely known smoothing strategies to obtain an approximate dual function with 
Lipschitz continuous gradient. They are based on the following modified Lagrangian and 
dual functions: 

(i) Augmented Lagrangian smoothing 0, EH 0,I 


£^(u,s,x) = f(u ) + (x,Gu + g - s) + ^||Gu + g- s 

dffiix) = min £^f(u,s,x). 

^ ' u£U,s£K M v ’ 


(ii) Nesterov smoothing 0,0, 0, EH 0,113: 


£“ s (u, s, x) = f(u ) + (x, Gu + g - s) + ^(|M| 2 + || s\\' 2 ) 


<(x) = 


mm 

ttGf/.sG/C 


a 


[u, s,x 


Note that, following the reasoning from (0,0,0, the Nesterov smoothing approxima¬ 
tion d“ s (x) requires the boundedness of the primal feasible set 1C x U. Thus, the general 
convex cone K. induces difficulties in using this strategy. In Section l3~H we present a mod¬ 
ified Nesterov smoothing technique which is able to cope with linear conic constraints 
and unbounded feasible cone 1C based on the new composite reformulation ©. 


3.2 Inexact first order methods for augmented Lagrangian smoothing 

In this section, we analyze the iteration complexity of the inexact first order methods 
for augmented Lagrangian smoothing, under Assumption 12.11 with <jf = 0, Assumption 
12.21 and Assumption 13.11 The inexact gradient Lagrangian method is equivalent with the 
classical augmented Lagrangian algorithm, namely the application of the inexact gradient 
method on the augmented dual function. The second first order Lagrangian method we 
analyze is the inexact fast gradient Lagrangian method, which is based on the application 
of the fast gradient method on the augmented dual function. We start with the classical 
augmented Lagrangian setting, i.e. we define [0 : 

Cffiiu, x) = f(u) + —dist*: (Gu + g -\ —x)-||x|| 2 and d^f(x) = min x). 

^ 2 \ g, J 2g ^ ueu ^ 

Note that, the augmented dual function represents a pure Moreau approximation of the 
original dual function: 

dff(x) = max d(z) -— \\z — x\\ 2 = max djj(z) -—|U — xll 2 . 

^ v ; w 2/i" " zeic* uy ’ 2/i 11 

Further, we observe that partial gradient of Cfjf w.r.t. x is given by: 


V x £?f(u,x) = Gu + g- 


Gu + g H —x 
h . 


K 
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For any x € M m we denote a primal exact solution by u*(x) € arg min £^ s (u, x). It is 
well-known, see e.g. [25(, that the gradient of augmented dual function d^ g (x) satisfies: 


Vd^(x) = Gu*Jx) + g- 


Gu*(x) + g + -x 


I 1 


K 


and additionally it is Lipschitz continuous with constant L& = jj. Moreover, the resulting 
augmented dual problem, given by: 


f* = max dffi(x), satisfies X* = arg max d ag (x) 




fcER 77 


Usually, it is difficult to compute in most of the practical applications the optimal solution 
u*fix) of the inner problem min £^ g (tt, x) and we can obtain only an approximate solution. 

Assume that we solve inexactly the inner problem and obtain an approximate solution 
Un(x) € U which, for a given accuracy 5 > 0, satisfies: 

0 < C^f(u^,(x),x) — d ag (x) <5 Vx € R m . (7) 

Then, we can construct a first order inexact oracle for the augmented dual function: 

Theorem 3.2 Let fi ,6 > 0, then we have the following first order inexact (35,2Lfi)- 
oracle for the augmented dual function d'jf: 

0 <^{u IJt (y),y) + (V x C^(u li (y),y),x-y)-d^(x) < ^\\x - y\\ 2 + 35, (8) 

for all x,y € M m , where the approximate solution u fl (y) satisfies ((TJ) and L f i = fi 
Proof. For the left hand side inequality of ©, we observe that: 


^(fi^{y),y) + {V x C ag (u^(y),y),x - y) > Cf(u^(y),x) > nun C ag (u,x) = d ag (x). 

For the right hand side inequality of ([H]h note that for any fixed u € U the function 
h(x) = C ag (u,x) — dlf(x) has Lipschitz gradient with constant L^ = 2/p and h(x) > 0 
for all x € M m . Therefore, using the notation L& = 1 /yu, we have: 

Kx)- mmh(x)>-^-\\Vh(x)\\ 2 = -^-\\V x C ag (u,x)-Vd ag {x)\\ 2 \/u € U. 

xSR m ZLfo 4.L c i ^ ^ 

Taking u = tt M (x) and using the definition of u /t (x), we have h(x) — rnin xe Rm h(x) < 
h(x) < 5 and obtain the following approximate gradient relation: 

\\V x q*{u»{x),x) - Xd ag (x) II = II Gu^x) - Gu*(x)|| < y/mr A . (9) 

From the Lipschitz continuity of Xdffi, © and (JU|) . we have that for any i,j/£ M m the 
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following relations hold: 


d^{x) > df(y) + {Vd^(y),x - y )-^±\\ x - y\\ 2 

0 T,a 

> ^(u,(y),y) + (Vd^(y),x - y) --±\\ x - yf - 5 

= ^ g (u^(y),y) + (V x C^(u^(y),y),x- y) - ^\\x - y \\ 2 - <5 
+ (Vd^ g (2/) - V x C^{ Ull {y), y), x - y) 

> £)?(' Un(y),y) + (^X^ s (u^(y),y),x -y)-^Y ||s - y\\ 2 - 5 

-\\Vdf{y)-VxC* g {u»{y),y)\\\\x-y\\ 

© T. _ 

> ^ s (u^(y),y) + {VxC^iu^y), y),x - y) - -y ||s - y || 2 -5- y/46L d \\x - y||. 


On the other hand, for any positive pair of constants (t, a) we have: at < 4 + Thus, 
taking t = y/L^\\x — y || and a = 2y/5 in the previous inequalities, we obtain the right 
hand side inequality of the theorem. ■ 


The relation ([8]) implies that the augmented dual function d &g is smooth and is equipped 
with a first order inexact (35, 2Ld)-oracle having (psjjx) = C^‘(u tx (x),x) and Vr/iy/^x) = 
V x C^ s (u fl (x),x) = Gu^(x) + g. It is important to note that many previous results on 
augmented Langragian methods require solvin g th e inner problem with much higher 
inner accuracy of order 0(5 2 ) (see e.g. 0, S 11,0), i.e. : 


0*(u»(x),x)-df(x)<0(5 2 ). 


It is obvious that our approach here is less conservative, imposing to solve the inner 
problem with less inner accuracy of order 5 as in (JT]). As we will see in the sequel, this 
new and important result from Theorem l3.2l will have a huge impact on the computational 
complexity of our methods compared to those given in the previous papers. In particular, 
the first order inexact oracle derived in [6j for augmented Lagrangian dual function is 
more conservative than the one from Theorem 13.21 and thus, its direct application will 
lead to much worse computational complexities than the ones we obtained in the present 
paper based on Theorem 13.21 

Given the pair (x k ,y k )k >o generated by Algorithm ICFG, in the following two sections 
we provide complexity estimates related to the convergence of the average primal point 
(u k )k>o defined in a compact way as follows: 


1 


fc-i 


fc-i 


u k = ~o8 ^2 where s k = ^2 ° k 

k i =0 i=0 


Notice that the weights 6k are either constant, i.e. 6k 
6 q = 0, 6i = 1 and &k+\ = 1+ '/] ) +ie K f or all k > 1. 


and u l = u IJ: (x l ). (10) 

= 1 for all k > 0, or updated as 
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3.2.1 Inexact gradient augmented Lagrangian method 

We now analyze the overall complexity of the classical augmented Lagrangian method in 
terms of projections onto the cone 1C and simple feasible set U, under various assumptions 
on the objective function /. A direct consequence of Theorem 13.21 and Theorem 12.31 is 
the following iteration complexity (in terms of outer iterations) of the inexact gradient 
augmented Lagrangian method. 

Corollary 3.3 Under Assumptions \2.1\ with af = 0, \2.2\ and lff.il let fi, 5 > 0 and 
( x k )k>o be the sequence generated by Algorithm ICFG (d^f, 0,3(5, 2Ld) with 0k = 1 for 

k -1 

all k > 0 and Ld = - r Define the average sequence (x k )k>i by x k = £ Yh x l+l . Then, we 
M i =o 

have the following convergence estimate on dual suboptimality: 

f*-d a J>(x k )<^ + 35. 

Note that the above convergence rate is linked only to the number of the outer iterations 
and omits the complexity of solving the inner subproblem at step 1 of ICFG. Before 
estimating the total complexity of the process containing the inner and outer levels, we 
provide convergence rates for the primal infeasibility and suboptimality. 

Theorem 3.4 Under Assumptions \2.1\ with af = 0. 12.21 and l3.ll let p, 5 > 0 and ( x k )k>o 
be the sequence generated by Algorithm ICFG(d^ 9 ,0,3 5, 2 Lfi) with 9k = 1 for all k > 0 
and Ld= j L - Let u l = u^x 1 ) be such that Cff{u l ,x l ) — df?(x l ) < 5 for 0 < i < k. Then, 
the average primal sequence (u k )k >i defined by ca satisfies the following relations: 

(i) The primal infeasibility is bounded sublinearly as follows: 


distjc(Gu k + g) < 


4 L d Rd 

k + 


12 L d 5 
k 


( ii ) The primal suboptimality gap is bounded by: 


AL d Rl 

k 


Rd 


l2LdS <f(u k )-r<UAAL +u . 


k 


k 


Proof. In order to facilitate an easy reading of the results, we provide the proof of primal 
infeasibility and suboptimality bounds in Appendix A.l. ■ 

Note that using the above rate of convergence, one might choose an appropriate value of 
the smoothing parameter p. such that after a single outer iteration an e—optimal point 
is obtained. Thus, convergence rates in terms of outer iterations are not relevant in this 
case and it is natural to analyze the computational complexity of the Algorithm ICFG, 
by taking into account also the complexity of solving the inner subproblems. Therefore, 
we need to also count the number of fast gradient steps, which includes projections 
onto U and /C, matrix-vector products Gu and G T x , or gradient computations V/(tt), 
performed in order to attain the required inner accuracy, at a given outer iteration. 
Since, in the literature this is usually measured in terms of projections onto U and 1C 
(see e.g. 0, !0)» we also use this measure of computational complexity. We further 
analyze the necessary number of inner projections that the inexact gradient Lagrangian 
method has to perform at each outer iteration. A well-known fact that we use further is 
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that the function u i->- distyc (Gu + g) 2 has Lipschitz continuous gradient with constant 
||G|| 2 [sj]. Using this observation, depending of the assumptions on the function /, 
we have the following inner iteration complexities for solving approximately the inner 
problem min ug [/ x) for a given x: 

(i) If the function / is simple, then Algorithm ICFG(^dist^(G • +g + jfx) 2 , /, 0, /r||Gj| 2 ) 
returns a primal point u^(x) £ U such that C^iu^x), x) — d^f(x) < 5 after: 


NT 


II G\\Du 



projections onto the primal simple feasible set 1C x U. 

( ii ) If the function / is not simple, but its gradient V/ is Lipschitz continuous with 
constant Lf > 0, then the Algorithm ICFG(£^ g (-, x), 0,0, Lf + /r||Gj| 2 ) returns a primal 
point Ufj,(x) £ U such that C^{u^{x),x) — d), g (x) < 8 after: 


N? = 


Du 


2(L / + / x||GP) 


( 11 ) 


projections onto the primal simple feasible set 1C x U. 

Note that if we take Lf = 0 in the iteration complexity dill) , we recover the convergence 
rate for the case when / is simple function. Therefore, for a uniform complexity analysis, 
we provide in the following result an upper bound on the total number of projections 
(for an optimal smoothing parameter g) performed by the Algorithm ICFG, which 
is dependent on Lf in the following sense: with some abuse of notation for / simple 
function we make the convention that Lf = 0, and thus we obtain the computational 
complexity for simple functions; otherwise, if we consider Lf > 0, then we recover the 
overall complexity for the case when V/ is Lj-Lipschitz continuous. Moreover, we assume 
for simplicity that = 0. 

Theorem 3.5 Under Assumptions \ 2 . 1 \ with (Tf = 0, \ 2 . 2 \ and \ 3 . 1 [ let /. i,e,8 > 0 and 
(x k )k>o be generated by Algorithm ICFG(d^ 9 , 0, 35, 2L^) with Q k = 1 for all k> 0. As¬ 
sume that at each outer iteration k, Algorithm ICFG(^distic(G--\-g+ j- L x k ) 2 , f,0, g\\G\\ 2 ) 
(if f is simple and with some abuse of notation we make the convention that Lf = 0) or 
Algorithm ICFG(Cj? (•, x fc ),0,0, Lf + /i||G|| 2 J (ifVf is Lf > 0 Lipschitz continuous) is 
called to solve the inner problem and obtain a primal approximate solution u k = u^(x k ) 
such that (u k , x k ) — d( 1 9 (x k ) < 5. Then, by setting the optimal smoothing parameter: 

and 5 = - (12) 

the average primal point u k defined by m is e— optimal after a total number of 


24 LfDfj | 6\\G\\DuR d 
e e 



p = max 


16R 2 


L f 


’ lie'll 1 


projections onto the primal simple feasible set 1C x U. 
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Proof. Using the inner accuracy from m into Theorem 13.41 then the outer iteration 
complexity of the augmented Lagrangian method is given by: 


’16 L d R 2 d 


[16^1 

e 


lie 


(13) 


Based on the general inner complexity (1111) . we are able to tackle both cases: when / is 
simple or V/ is Lf > 0 Lipschitz continuous. Minimizing the upper bound on the product 
j\t outNl n over positive parameters y we get that the value of y given in (|12l) is optimal 

up to a constant w.r.t. the total complexity. Combining (fT2j) . i.e. y = max | 1 j, 
with dill) , we obtain the following bound on the overall complexity: 


N° ut Nf < 


< 


16 R 2 d 

lie 


D J 6(L /+ „|GF) +1 


+ I j 1U,; 

lie 


6(^/ + ^||g|| 2 ) 


m, Wu fu± + 


6||G|Pfi2 


+ 1 < 


24 L f Du | 6Du\\G\\R d | l 


Note that if we take Lf = 0 we obtain an upper bound on the overall complexity of 
Algorithm ICFG for the case when / is a simple function. ■ 

Remark 1 It is interesting to observe that choosing the smoothing parameter y >-—, 

the estimate m leads to the fact that the inexact gradient augmented Lagrangian 
method terminates after solving only once the inner subproblem. In other words, if an 
upper bound on R ^ is known, then starting from an arbitrary dual initial point G R m , 
it is sufficient to compute u /1 (x°) € U satisfying C^{u^{x 0 ),x 0 ) — d(x°) < e to obtain 
an e—optimal solution of (jT]). In particular, if x° = 0, then the inner subproblem has the 
form: min f (u) + ^distjc(Gu + g) 2 , which can be seen as a differentiable penalty problem. 

We conclude that, in the case of known Rj, the gradient augmented Lagrangian method 
is similar to the quadratic penalty method. ■ 


3.2.2 Inexact fast gradient augmented Lagrangian method 

We further incorporate Nesterov accelerated step into the classical augmented Lagrangian 
method, i.e. we apply the inexact fast gradient method on the augmented dual problem. 
We analyze the overall complexity of the inexact fast gradient augmented Lagrangian 
method, under the Assumptions [2Tl with Uf = 0. 12.21 and 13.11 Using the inexact oracle 
relation (|8]) and Theorem 12.31 we immediately obtain the following iteration complexity 
(in terms of outer iterations) of the fast gradient method: 


Corollary 3.6 Under Assumptions ] with Of = 0, \2.2\ and \3.1l let fi,5 >0, and 


(x 


,y'")k> o be the sequences generated by Algorithm ICFG(d < ff > ,0,35,2L^,) with dk+i = 


i+y/i+id^ joy, dii fc > i Then, we have the following estimate on dual suboptimality: 


r 



±L d R 2 d 

(k +1) 2 


+ 3 k5. 
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Note that the above convergence rate is linked only to the number of the outer iterations 
and omits the complexity of solving the inner subproblem at step 1. Before estimating 
the total complexity of the process containing the inner and outer levels, we provide 
convergence rates for the primal infeasibility and suboptimality. 

Theorem 3.7 Under Assumptions \2.1\ with <7/ = 0, \2.2\ and 13.11 let p , 5 > 0 and 
{x k ,y k )k>o be the sequences generated by Algorithm ICFG{d ( (f,0,36,2L d ) with 9k+i = 

l+y/i+'ie^ ^ ^ i ] e t u l = u^(x l ) be such that {u l , x l ) — d(f(x l ) < 5 for 

0 < i < k. Then, the average primal sequence {u k )k>i defined by cnD satisfies: 

(i) The primal infeasibility is bounded sublinearly as follows: 


distK ( Gu k + g)< + 8^/^. 

(ii) The primal suboptimality gap is bounded by: 


8L d R 2 d 

k 2 


8 R d 


3 L d 5 


< f{u k ) - r < 


8L d \\x 


0112 


k 2 


+ 3 k5. 


Proof. We provide the proof of the primal infeasibility and suboptimality gap bounds in 
the Appendix A.2. ■ 

The necessary number of inner iterations that the inexact fast gradient augmented La- 
grangian method has to perform at each outer iteration is given by m As in the 
previous section, in the following result we provide the total number of projections per¬ 
formed by Algorithm ICFG, for simple objective functions (i.e. we make the convention 
that Lf = 0) and objective functions with Lipschitz continuous gradient (i.e. we have 
Lf > 0). Moreover, we assume for simplicity that x° = 0, R d > 1 and e sufficiently small. 

Theorem 3.8 Under Assumptions \ 2 . 1 \ with aj = 0, \ 2 . 2 \ and \ 3 . 1 \ let y , e,S > 0 

and {x k ,y k )k >0 be generated by Algorithm ICFG(d^ 9 , 0, 3<5, 2L^) with d^+i = l+ ^ 1 ^ rA9k - 
for k > 1. Assume that at each outer iteration k, Algorithm ICFG(^distfz{G ■ +g + 
jx k ) 2 , /, 0, /r||G|| 2 J (if f is simple and we make the convention that Lf = or Al¬ 
gorithm ICFG(C( l 9 (-,x k ),0,0, Lf + /j,\\G\\ 2 )(if V/ is Lf > 0 Lipschitz continuous) is 
called to obtain an approximate solution of the inner problem u k = u^{x k ) such that 
£(?(u k , x k ) — d(f(x k ) < S. Then, by setting the optimal smoothing parameter: 


T = 


16 R 2 d 


and 5 


e 

24 


(14) 


the average primal point u k defined by (USD is e— optimal after a total number of 


14 L) ,2 Pu | 56R d \\G\\Du 


projections onto the primal simple feasible set 1C x U. 

Proof. First, we observe that if R d > 1, then from Theorem 13.71 the number of the outer 
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iterations N° ut satisfies: 


N out = 


(LA 1/2 


r n\ i/2 i 

4Rd — 

= 

4i? d — 

V e J 


\^J 


for any fi > 0, and by forcing both terms in Theorem 13.71 (ii) to have lower magnitudes 
than e, then the inner accuracy 5 satisfies: 

Th. l3.7h i~) r e e 2 N out 'i e 2jy out 

6 < min <--%— > = -^—. 

I384 RlL d J 384 R\L d 

e 2 N out 

If one chooses 5 = 384fi e 2 Ld , then this inequality implies that: 


Nr < 2 Du 


2(L/ + H|GP) 


< 28Djj \ 


\L f +^\\G\m A 

/i 1 /2 e 3/2 


For simplicity let // satisfy /i > n^jp- Then, we obtain in this case the following compu¬ 
tational complexity: 


Nl n N° ut 


< 


42||G||Z% J R; / y /4 


,3/4 


4^d 

(/xe) 1/2 


+ 1 


168||G||^^ /2 + i^GWDuR 1 / 2 ^/ 4 


^t 1 / 4 e 5 / 4 


e 3/4 


Minimizing over the set {/i 6 1 | /i > we °ktain that the best complexity is 

{ L 16 R 2 1 

, e d >. For a sufficiently small e, the parameter fi becomes 

/j, = — — ± , which implies N° ut = 1 and further leads to: 5 = Since N° ut = 1, under 
the above choice the total number of projections onto 1C x U required for attaining an 
e—optimal point is given by: 


8DfcL f + »\\G\\z) ^ UL^Pu 56\\G\\DuR d 
5 ~ e 1 / 2 + e 

which proves our result. Note that if we make the convention that Lf = 0, then we get 
the overall complexity for the case when / is convex and simple function. ■ 

It can be observed that, for an optimal choice of the smoothing parameter //, the inexact 
fast gradient augmented Lagrangian method has the same computational complexity as 
the classical inexact gradient augmented Lagrangian method, i.e. 0 (4) total projections 
onto simple set 1C x U . However, we will show the superiority of the fast variant in 
Section [5l when we analyze the complexity of first order augmented Lagrangian methods 
for attaining the optimality criteria introduced in 0- 
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3.3 Adaptive inexact augmented Lagrangian method 

We have previously seen that, in the optimal case, both classical and fast augmented 
Lagrangian methods are dependent on the constant ||a:*|| via R d , which in general is un¬ 
known a priori. Therefore, in this section we introduce implementable variants of previous 
first order augmented Lagrangian methods, which approximate ||x*|| at each iteration, 
but maintain the same optimal computational complexities with those given in the pre¬ 
vious theorems (up to a logarithmic factor). First, we observe that in the optimal case 
(when ||x*|| is known), both classical and fast augmented Lagrangian methods perform 
a single outer iteration in order to attain an e—optimal point. Therefore, we can in¬ 
tuitively apply a search procedure which finds an upper bound on ||x*|| in logarithmic 
number of steps, by performing a single outer iteration and restarting the augmented 
Lagrangian method. It is important to observe that this restarting strategy leads to an 
identical scheme for both classical and fast augmented Lagrangian methods. Through¬ 
out this section, we assume that the gradient V/ is Lipschitz continuous with constant 
Lf > 0 (when / is simple, with a similar reasoning as given below, we can obtain the 
same computational complexity results). 


Algorithm A-IAL (go,e) 

Initialize x° € M n . For k > 0 do: 

(1) Compute u k € U such that C &g {u k ,x k ) — dA g {x k ) < | 

(2) Update: x k+1 = x k + p,k'^x^ s (u k ,x k ) 

(3) If dist ic{Gu k + g) < e, then STOP and return u k , 
otherwise, k = k + 1, pk+i = an d go to step 1. 


This adaptive scheme is equivalent with the classical augmented Lagrangian method but 
with increasing smoothing parameter. Further, we present the computational complexity 
of this algorithm in the last primal point u k and compare it with the previous results. 

Theorem 3.9 Under Assumptions \2.1\ with aj = 0, \2.S\ and IV. A let e,go>0 and 
(x k )k>o be the sequence generated by Algorithm A-IAL(g o, e). Assume that at each outer 
iteration k> 0, the Algorithm ICFG(C^f(-,x k ),0,0, Lf +^t||G|| 2 / ) is called to obtain u k 
such that C^ 9 (u k , x k ) — d°f(x k ) < Then, after a total number of: 



^max 


I6i§ L f n 

irn ’ Moll G\\ 2 i) 


6 LfD 


f u u 


1/2 

+ 1 


l6\\G\\R d Du t ±l) /2 Du 
+ e + eV2 


projections onto the simple set /C x U, the last primal point u k satisfies 


- e||x*|| < f(u k ) - f* < e, distfc(Gu k + g) < e. 


(15) 


Proof. From Theorem 13.51 it can be seen that the inexact gradient augmented La¬ 
grangian method performs a single outer iteration if the optimal smoothing parameter 
p* = max ||gj| 2 1 is chosen. Therefore, by iteratively doubling an arbitrary initial 

value go of the smoothing parameter, we attain g* after: 


N out = 


log 2 
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iterations. If the optimal value p* is attained, a single A-IAL iteration would be sufficient 
to obtain an e—optimal primal point. However, since we do not know in advance /*, we 
check only the feasibility criterion and stop if a prespecified accuracy is reached. This 
stopping criterion ensures that the final point u k , when the algorithm stops, satisfies: 

-||x*||e < f(u k ) - f* < e, dist ic{Gu k + g) < e. 

From (HID it can be seen that the maximal number of projections performed by the 
Algorithm A-IAL, in order to ensure the above set of criteria, is given by: 


TV? 


TV? 


E = E 


k =1 


k =1 


6(L f + » k \\G\\*)Dl 




< 


M/P^ y /2 i 2 k/2 / (6/i 0 ) 1/2 ||G||AA 


< K 


out 


6L f Dl,' 1/2 


e J 
+ 1 




a/2 




+2 


^y /2 /(3 


Mo/ V 


1/2 


< N' e 


out 


< K 


out 


eLfDiy' 2 
vL f D a 1/2 


+ 1 


+ 


(16Rj , L f \ 1/2 / 4||G-||^ \ 


\ e 


Ill'll 2 


V ^/2 ; 


+ i 


■ 1 / 2 ] 


, 16||G||^Z/ct , 4L / ^ 

+ e + f i/2 ! 


which proves our statement. ■ 

The above result establishes that the Algorithm A-IAL performs O ( 4 ) total projections 
onto simple set /C x U in order to obtain a primal point satisfying (1151) . Note that the 
order of the computational complexity is the same for the Algorithm A-IAL and for the 
inexact gradient augmented Lagrangian method. However, this adaptive scheme A-IAL 
has the advantage that it is implementable, i.e. the stopping criterion can be checked 
and the parameters of the method are computable. 


3.4 Inexact first order Lagrangian method for a modified Nesterov 
smoothing 

The smoothing strategy presented in the previous sections is equivalent with the appli¬ 
cation of the classical Moreau smoothing technique on the entire dual function d. Unlike 
this classical approach, we take in this section a new different path: we make use of the 
new separable structure of the dual function ([6]) and we only smooth the Lagrangian 
part djj of the dual function and keep the nonsmooth part d/c unchanged. Based on ©, 
we introduce the following smooth approximation of d\j\ 

du,u( x ) = m in£ ft (u,x), where C^u, x) = f(u) + (x, Gu + g) + g pu(u), 
u£U 

where pu{u) is a simple prox-function, continuous and strongly convex on U. Denote 
uq = argmin U £uPu(v) and assume without loss of generality that pu(uo) = 0 and its 
strong convexity parameter is 1. Then, we have pu{u) > l/2||u — uq|| 2 Vu € U . One 
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typical example satisfying these assumptions is pu(u) = 1/2|M| 2 . The function djj^ has 
Lipschitz continuous gradient fl9( ] : 

\\Vdu,ii(x) - Vdu,n(y)\\ < L d \\x - y\\ Vx,y<E M m , with constant L d = \\G\\ 2 / g. 

First, note that if g = 0, then we recover the classical Lagrangian and dual functions. 
Secondly, the gradient of du,a satisfies: 

Vd[/, M (x) = Gu*(x ) + g, where u*(x) € argmin£„(u, x). 

p p u&U 

Moreover, we use in the sequel the following notation for the partial gradient of 

V x £ m (u,x) = Gu + g. 

The smoothed dual function du./i leads to a novel smooth approximation of the composite 
dual function d, that we aim to maximize using fast gradient method: 

fa = max da(x) (= djj^x) + d K (x)). 

^ X£K m 

We denote with X* = arg max l6 i™ da (x) the optimal solution set of the smoothed dual 
problem and x* an optimal point. It is important to note that, in many cases, u*(x) 
cannot be computed exactly, but within a pre-specified accuracy, which leads us to the 
inexact framework introduced in the previous section. Thus, in the rest of the section we 
define u^x) € U the inexact solution of the inner problem satisfying: 

0 < Ca{ua{x), x ) - du,a(x) < 5. (16) 

Then, we can derive a first order inexact oracle for the smoothed dual function du,a : 

Theorem 3.10 Let g, 6 > 0, then we have the following first order inexact (35,2Ld)- 
oracle for the smoothed dual function djj,a : 

0 < Ca{ua{y),y) + (V x Ca(ua(y),y),x - y) - du,a ( x ) < “^11® - yf + 3 5 (17) 

for all x,y € M m , where it M (y) € U satisfies (fT6l) and L d = 

Proof. The proof is similar with the one given in Theorem 13.21 and thus we omit it. ■ 

The relation (HD implies that the smoothed dual function djj,a is equipped with 
a (3d, 2L d )-oracle, i.e. 4>s,l(x) = ^(^(x), x) and Xfis tL (x) = V x £ M (u M (x), x) = 
Gua(x)+g. We notice that there are some previous results on the application of Nesterov 
smoothin g te chnique for solving the dual of linear equality constrained convex problems 
S E, 051. [Til. [2^ ]. but these algorithms require exact solution of the inner subproblem 
and more conservative convergence estimates are derived. Further, we estimate the rate 
of convergence of Algorithm ICFG on the modified Nesterov smoothing of the dual 
function. First, let us redefine the following finite quantity: 

Ra = max min ||x° — x* II < 00, 

U&C X*ex; V" 
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where C is a compact set in M + . From [l7t ] [Lemma 1] it follows immediately that such 
an is always finite for C = [0, c], with 0 < c < oo, provided that a Slater vector 
exists. Note that we can also bound min ||x° — x*|| < Rd- Using Theorem 12.31 we get 

the following estimate on dual suboptimality: 


Corollary 3.11 Under Assumptions 1 2 . 1 \ with Gf = 0, \ 2 . 2 \ and lS.ll let p ,5 > 0 and 
(. x k , y k )k >o be the sequences generated by Algorithm ICFG(du tfl , die, 3 6, 2 L d ), with 6k+i= 

—^2- - f or k>l. Then, we have the following estimate on dual suboptimality: 


dfj,(x k ) < 


*L d R 2 d 

(k + l ) 2 


+ 3 kd . 


We further estimate the rate of convergence of the average primal sequence generated 
by ICFG on the modified Nesterov smoothing of the dual function. For simplicity of 
the exposition, we assume further that x° = 0, u° = 0, Rd > 1, ||G|| > 1, Du > 1 
and e < 1. However, in the case when one of these conditions does not hold, then there 
is no change in the order of our results, but slight differences in constants. Using these 
simplifications, we obtain the following outer iteration complexity for ICFG. 


Theorem 3.12 Under Assumvtions \ 2 . 1 \ with (if = 0, \ 2 . 2 \ and \ 3 . 1 l let p,5 > 0 and 
(x k ,y k )k >o be the sequence generated by Algorithm ICFG(du,ii,djc,2>5,2Ld) with 9^+1 = 

1+ ^2 +46> ~ f or k > 1. Define u l = u^x 1 ) such that ,x l ) — du,fi(x l ) < 5. For any 

fixed number of outer iterations K > 1, if we set y(K) = 2 • the average primal 

point u K defined by COD satisfies: 

(i) The primal infeasibility is bounded sublinearly as follows: 


distic(Gu K + g) < 


^ / 2 \\G\\Du | . 


^ \\G\\DuS y 2 


(18) 


(ii) The primal suboptimality gap is bounded sublinearly by: 

_ 2 3/2 \\G\\D u R d _ 2 (u^u^^i/2 < f ^ K) _ f * < 23/2 II G\\R d Du + 3R§ (1Q) 

A A 

Proof. This proof is similar to the one given in Appendix A.2. However, it is also given 
in the companion paper 0, Appendix A.3]. ■ 


It is important to remark that if the functions / and pu are simple, then by definition, 
the solution of the inner problem min ue [/ C^iu, x) can be found efficiently (e.g. in linear 
time or even in closed form). Otherwise, £^(-,x) is a composition between a function / 
with V/ Lipschitz continuous with constant Lf > 0, and a //-strongly convex and simple 
function pu and thus, Nesterov optimal method for composite problems with a strongly 
convex part and a smooth part finds an a ppr oximate solution u^ (x) for the inner problem 
satisfying C^u^x), x) - d Utfl (x) < 6 in |21[: 


NT 



( 20 ) 
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projections onto the simple set U. Now, we are ready to derive the overall iteration 
complexity of Algorithm ICFG in this case: 


Theorem 3.13 Under Assumvtions \2.1\ with Of = 0, \2.2\ and \3.1l let p, 5 > 0 and e > 0, 
and the sequences ( x k , y k )k> o be generated by the Algorithm ICFG(du )ti , d/c, 36,2L d ) with 


dk+l — 


1+ '/ 1+Ael for all k> 1. Also let N° ut = 


6\\G\\DuR d 


Assume further that the 


primal average point u k is given by m, then the following assertions hold: 

(i) If the function f is simple, then by setting an optimal smoothing parameter: 


2 3 / 2 \\G\\Rd 
l ' D v N° ut 


and 6 = 0, 


the primal average point u k is e— optimal after 


k 


6|| G\\R d Du 

e 


projections onto the primal feasible set U and polar cone 1C*. 

(ii) If the function f is not simple, but V / is Lipschitz continuous with Lf > 0 and, 
at each outer iteration k > 1, Nesterov optimal method for strongly convex and smooth 
objective functions fOuhl is called to obtain an approximate optimal point u k = Ufj,(x k ) € U 
such that C^(u k ,x k ) — du,/j,(x k ) < 6. By setting an optimal smoothing parameter: 


i _ 2 3 / 2 \\G\\R d 

/J D v Nf ut 


and 6 


e 

6 N° ut ’ 


then the average primal point u k is e— optimal after at most 


k = 


(24\\G\\R d DfjL 1 / 2 UL^WGWDuR^ ^ (3Z\\G\\R d L f D 3 u \ , / 

l e 3 / 2 + e J [ g V e 2 ) + 




projections onto the set U and 


6\\G\\R d Du 


projections onto the cone K*. 


Proof. By forcing both hand sides in (|19l) to be equal with e, then we obtain: 

"6|| G\\DuR d 


N ont = 


( 21 ) 


outer projections onto 1C*, and the inner accuracy satisfies (provided that Lf >0): 


6 = min 


_ t _£_\ 

SWGWDuR^ j 


~ 36\\G\\D L rR d 


( 22 ) 


Considering the optimal choice of the smoothing parameter (see 0 Appendix A.3]) and 
taking into account the bound PIT) we get: 


MfV° ut ) 


2 3 / 2 ||G||i? d 

DjjN° ut 
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Further, using (1221) and the smoothing parameter in the inner complexity (1201) . 

we get the following bound on the total number of inner iterations: 


t < 12\\G\\DuR d ^J UfPh | v /r^j bg ^ 3611^11^^3 

12\\G\\DuR d 

( 2A\\G\\R d D 2 u L 1 / 2 12L 1 J 2 \\G\\Du Rd, 

y e 3 / 2 + e 


) [ fcg (M) + , 


These bounds confirm our result. ■ 

Remark 2 If the objective function / is strongly convex, i.e. it satisfies Assumption ^. II 
with af > 0, and has Lipschitz gradient of constant Lf > 0, then it is well known that 
the dual function d has Lipschitz gradient with constant L ^ ^ (l3| . and therefore 

any smoothing technique is redundant. In this setting, using the first order inexact oracle 
framework from previous sections, we can easily derive overall complexity of Algorithm 

ICFG for the average primal point u k of order O log (^2 projections onto the set 
U and projections onto the cone 1C, see |l3| for more details. ■ 

Remark 3 If we assume that there exists a bound R r , such that max ||u° — idadll < 

F xGlC* 

R p < oo, then we can remove the boundedness assumption on U (i.e. Assumption 2.2 
(ii)) and all the previous complexity results hold by replacing Du with R p . ■ 

In conclusion, the inexact fast gradient method for the modified Nesterov smoothing 
of the dual function performs the same number 0 ( 7 ) of projections onto the cone as 
the previous inexact first order augmented Lagrangian methods. However, in Nesterov 
smoothing method for smooth objective functions the number O (7372) of projections 
onto U is significantly larger than in the previous augmented Lagrangian smoothing 
methods. On the other hand, the optimal smoothing parameter /r given in the augmented 
Lagrangian framework cannot be fixed a priori due to its dependence on ||x*|| via Rd 
and thus we need some adaptive scheme, while the optimal choice of /r in the case of 
Nesterov smoothing strategy can be easily computed in the initialization phase according 
to Theorem 13.131 


4. First order penalty methods 

The complexity analysis of primal-dual methods from Section [3] has been based on the 
Assumption 13. II Also the most papers on penalty methods make the strong Assumption 
EU that is there exists an optimal Lagrange multiplier for the primal convex problem 
m i- This property is usually guaranteed through a Slater type condition, which in the 
large-scale settings it is very difficult to check computationally or such a condition might 
not even hold. In this section we remove Assumption 13.11 and analyze various penalty 
strategies for solving the conic constrained convex optimization problem (jT]) without 
this assumption. Therefore, we now consider the conic convex problem (JT]) which does 
not necessarily admit a Lagrange multiplier that closes the duality gap. To the best of 
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our knowledge this is one of the first computational complexity results for first order 
penalty methods for conic problems when it is not necessarily assumed the existence of 
a Lagrange multiplier that closes the duality gap. 

First, denote /* = min f(u). Given the difficulties induced by the linear conic constraints, 

uGU 

the original problem (JT]) can be reformulated in this case, using a (non)differentiable 
penalty function, as an optimization problem with simple constraints. Therefore, for a 
penalty parameter p > 0, the basic penalty reformulations of problem (JT|) are as follows: 


min 

ueu 

i’ p (u) 

(= f(u) + |dist k (Gu 

+ s)' 2 ), 

(23) 

min 

ueu 

M u ) 

(= f(u) + pdistic{Gu 

+«))■ 

(24) 


Depending on the context, we denote u* £ arg min (it) or it* £ argmin</> p (u). It is 

well-known that both formulations have certain advantages and disadvantages. The dif¬ 
ferentiable formulation (1231) features good smoothness properties, but it is regarded as 
an inexact penalty problem, i.e. as p —>• oo we have u* —»■ u* £ U*. On the other hand, 
the nondifferentiable formulation (|24l) lacks smoothness properties, but in the case when 
optimal Lagrange multipliers for (JT]) exist, there is a finite threshold p* > 0 such that for 
any p > p* , we have u* = u* £ U*. We recall the convexity property of the distance: 

disbt(Ghi + g) > distyc(G ! u + g) + (G T s(v),u — v ) Mu, v £ M m , (25) 

where s(v) £ <9dist /c(Gv + g) denotes a subgradient at v of function dist^(G' ■ +<?). From 
(l25j) . it can be easily seen that for any u £ M m such that Gu + g € /C results: 

( s(v),Gv — Gu) > distyc(G'u + g) Mv £ M m . (26) 

Further we analyze both penalty strategies combined with fast gradient method and we 
derive the overall complexities for them. 


4.1 Fast gradient differentiable penalty method 

If the gradient V/ is Lf > 0 Lipschitz continuous, then the penalty function has also 
Lipschitz continuous gradients with constant = Lf + p\\G\\ 2 . Note that the optimality 
conditions of (031) are: 

(V/(u*) + pdistjc(Gu* + g)G T s(u* p ),u — u* p ) > 0 Mu £ U. (27) 


Now, we state our result regarding the computational complexity of the penalty method 
with differentiable penalty, regarding simple objective functions (set Lf = 0 in the com¬ 
plexity estimate) or smooth objective functions with Lipschitz continous gradients (i.e. 
L f > 0). Define A* = f* — /*. 


Theorem 4.1 Under Assumvtions \2. 1\ with aj = 0 and 1,2.HI let p > 0, e £ (0, A*/2) and 
(u k ,v k )k> o be the sequence generated by the Algorithm ICFG(ip pi 0,0, Lp) with 9k+i = 

1+ ^o +461 - for k > 1. If the penalty parameter satisfies: 


P > 


4A* 


(28) 
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and f is simple (Lf = 0) or S7 f Lipschitz continuous (Lf > 0), then after 

k = 


2 L f Dfj | ( 8 A*) 1 / 2 ||G||A/ 


e 3/2 

projections onto the simple set 1C x U, we have: 

- A* < f(u k ) - /* < e, dist K (Gu k + g)<e. (29) 

Proof. First, observe that by taking u = u* in ([271) and using (1261) . we obtain: 

- u*) > pdist/c(Gup + g) 2 - 

Taking into account that f(u*) > /*, then from the convexity property of / results: 


dist/c {Gu* +g) < 


'/(«*) - /* 



Therefore, a sufficient condition for dist ic(Gu*+g) < e/2 is p > ^4-. Let u € U satisfying: 
/(«) + xdist^;(Gu + g ) 2 - f{u*) - ^dist/c(Gu* + g) 2 = ip p {u) - ip* p < e. (30) 


Using the convexity property of /, 


P j 2 

and (f27l) . then the relation (l30l) implies: 


e > ^dist/c {Gu + g) 2 - ^dist,c(Gu* + g) 2 + (Vf(u*),u - u*) 

m o o „ 

> ^dist k (Gu + g) 2 - ^dist ic(Gu* + g) 2 + pdist*;(Gu* + g)(G T s{u*),u* - u) 

0 n n 

> ^dist/c(Gu + g) 2 + ^dist/c(Gu* + g) 2 — 

- pdistic(Gu* + g) (dist*;(Gu* + g) + (G T s(u*),u - u*)) 

= | [distjc(Gu* + g) - dist K {Gu + g )) 2 . 

The last relation leads to: 


dist/c (Gu + g) < y — + dist*;(Gu* + 5 ). 

For a penalty parameter satisfying ([281) and e < A*/2, we reach e—infeasibility: 

e 3/2 


dist/c(Gu + 3 ) < 


V 2 A* 


< e. 


To obtain suboptimality bounds, first note that the left inequality stating f{u) — f(u*) > 
—A* is trivial. Second, the relation (1301) implies: 

f(u) - f(u*) < f(u) + |dist/c(Gu + g) 2 - f(u*) < ip p {u) - ip* < e. 
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By choosing p > ^4- and solving the differentiable penalty problem (1 2 ■'ll) with accuracy e 
leads to an e—optimal point of the original problem JTD which satisfies optimality criteria 
(1291) . For any fixed penalty parameter p > 0, Algorithm ICFG^p, 0, 0, L^) generates a 
sequence (u k )k >o with the convergence rate (see Theorem 12.311 : 


il>p(u k ) - if* < 


2{Lf + p\\Gf)Dl 

(k + l ) 2 


This rate of convergence implies that after 


k 


l 2(L f + p\\G\\i)D 


2 

U 


projections onto 1C x U, we get if p {u k ) — if* < e. Further, taking into account the esti¬ 
mation of the penalty parameter (1281) . we can bound the previous estimate as: 




8A*||Cpog, 


Note that the last estimate implies our result. ■ 

The following simple example shows the tightness of our result given in Theorem 14.11 
Example 1 Given p > 1, consider the following convex problem: 

min f{u) (:= U 2 ) s.t. \u 2 \ p < u\, u\ = 0, 

uSK 2 

where U = {u € R 2 | \u 2 \ p < u\}. Note that the feasible set contains only the trivial 
point (0,0), and implicitly we have u\ > 0. The Slater condition does not hold in this 
case. First, we show that this optimization problem does not admit a Lagrange multiplier 
closing the duality gap. The dual problem of the above example is given by: 


max min U 2 + xu\ s.t. \u 2 \ p < u\. 
isi mgr 2 


Since the objective function is linear, an equivalent form of the dual problem is: 


1 i/p , 

max mm ±uf + xu\. 
iGR U 


Considering the case U 2 = —v^ p (for the other case we can use the same reasoning), with 
the implicit constraint u\ > 0, the optimal solution u\ of this minimization subproblem 

p 

is given by: u\ = ( px ) 1 ~p. Replacing this value into the cost, and taking into account that 
we have to keep u\ > 0, then we obtain the dual problem: 


sup 

x>0 



The dual function is negative for any x > 0, and thus we do not have a bounded Lagrange 
multiplier attaining the supremum. Further we estimate the value of the penalty parameter 
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p such that we get e— infeasibility for u*. The quadratic penalty reformulation is given by: 


min uo + -u? s.t. \u 2 \ p < u\. 
msr 2 2 


Observe that the minimizer u* of the above problem is on the boundary of the feasible 
Then, we get the following equivalent problem: 


set, i.e. | U 2 \ p = u\. 


mm 

m 2 £R 


, P 2 p 

U2 + ~U 2 F . 


The optimality condition of the above problem is given by 1 + pp[(u* p ) 2 ] 2p 1 = 0, which 
immediately implies: 




(31) 


From this expression and the fact that \(u*) 2 \ p = (u*)i, it can be derived that 
e— infeasibility is attained, i.e. |(u*)i| < e, provided that the penalty parameter satisfies: 


P > 


1 (\ 


2-i 


A/p 

p 


I p 

Observing that is a convex function of p, the minimal value of this expression is 
attained for p* = ln(l/e). Replacing this value in the above estimate, we have: 

/i\ 2 _ 1 /1\ 2 

P “ln(l/e)VeJ eln(l/e) V e / 

where e is the Euler constant. Therefore, for this example, the penalty parameter should 
satisfy p = O (4) (up to a logarithmic factor), which confirms the tightness of our result 
given in Theorem £3 ■ 


4.2 Fast gradient nondifferentiable penalty method 

Given the nonsmoothness feature of the penalty function <f p , we replace the nonsmooth 
term distyc(G • +g) with a basic smooth approximation. Thus, for a given smoothing 
parameter p > 0, we replace the original problem with the following smooth problem: 

min 4> p , p {u) (= f{u) + py / dist^Gu + g ) 2 + pA . (32) 

itGc/ V / 

Note that if V/ is Lipschitz continuous with constant Lf > 0, then V(j> PiP is Lipschitz 
continuous with constant L $ = Lf + yy- We denote it* € argmin (j> Ptll (u) and, for 
simplicity, assume that A* > e (otherwise some minor changes in constants will occur). 

Theorem 4.2 Under Assumptions \ 2 . 1 i with Of = 0 and 1 2 . SI let p , p,e > 0 and the 
sequence (u k ,v k )k> o be generated by the Algorithm ICFG(<p P}P , 0, 0, L^f) with = 
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i+Vi+4 ei 
2 


for all k > 1. If the following conditions hold: 


2A* 

P = -+ 1 


and p 


e 

2 ’ 


( 33 ) 


and f is simple (convention Lf = 0)or\7f Lipschitz continuous (Lf > 0), then after 


k = 


J 2L fu | 



projections onto the primal simple feasible set K. x U, we have: 

-A* < f(u k ) -/*<€, dist K (Gu k + g)<e. 


Proof. Let u € U be an e—optimal point for the smoothed penalty problem (13211 satisfying 

<t> P A u ) ~ - e ’ i ’ e - we have: 

f(u) + pyj dist k {Gu + g ) 2 + g 2 - /(«*) - p^j dist/c(Gu* + g ) 2 + /x 2 < e. (34) 
First, the relation (1341) implies the following: 


/(«) - /* < f(u) + pV / dist7(G7T7)^+7 f - f* ~ PP 

< f{u) + p^6dst(^(GhTW r +l^~ /(«£) - py<hst^(G«*T^)7+^2 < e . (35) 

Second, from (1341) we have the following feasiblity relation: 


distjc (Ght + g) < dist^(Grt + g) 2 + p 2 

m f( u *p) + P\j dist/c (Gu* + g) 2 + p 2 - f{u) + e 

P 

//*-/*+ e , _ A* + e , 

<-h p —-b p. 

P P 


Therefore, choosing the parameters conformal to (I33|) . any point satisfying (1341) is 
e—optimal in the optimality criteria (1291) . Given arbitrary p,p > 0, the Algorithm 
ICFG ((j>p lf i, 0, 0, L^) applied on the smoothed problem (1321) generates primal sequences 
{u k ,v k )k >o satisfying the following convergence rate (see Theorem 12.31) : 


<t> P A u ) - 

Thus, the e—suboptimality for problem 


i* < 

p,p — 


2 ( L f + p*f) D u 


k 2 

is attained after at most: 


PhD 2 u , 


2P\\G\\D 2 U 

pe 
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projections onto the set /C x U. Taking into account the assumptions (1331) . we obtain the 
computational complexity estimate given in the theorem. ■ 


Remark 4 It is easy to prove that if the objective function / is strongly convex, i.e. it 
satisfies Assumption l2.il with a p > 0, then the differentiable and nondifferentiable penalty 
methods from previous sections have computational complexity in the last primal point 
u k of order O log(^)) projections onto the set /C x 17. ■ 

From previous discussion it follows that the optimal penalty parameter p depends on 
A*, which in general is unknown a priori. Therefore, in the next section we introduce 
implementable variants of previous first order penalty methods, which approximate A* 
at each iteration, but maintain the same optimal computational complexities with those 
given in the previous theorems (up to a logarithmic factor). 


4.3 Adaptive fast gradient penalty method 

In this section, regardless of the type of penalty function, we introduce an Adaptive 
Penalty Method (A-PM), which rely on a sequential increase of the penalty parameter p 
until a satisfactory value is attained. 


Algorithm A-PM (po,e,s) 

1. Set k = 0 and choose uo G U. If s = U N" choose p > 0. For k > 0 do: 

2. Apply the Algorithm ICFG on the (smoothed) penalty subproblem 
and find u k such that: 

tpAu k )-r Pk <e , if *=“£>"; 

^ fe (u fe )-^<e, if s = “N". 

3. If the iterate u k satisfies dist ;c(Gu k + g) < e, then STOP. Otherwise, 
set Pk+i = 2 pk, k = k + 1 and go to step 2. 


In the previous sections we have seen that, in the general case, when the optimal Lagrange 
multipliers do not necessarily exist, there is a penalty parameter p dependent on the 

type of penalty function, i.e.: p = (if? fOT smooth penalty , such that if pk > 

I , for nonsmooth penalty 


p and e < A*/2, then u k satisfies (1291) and the algorithm stops. Further, we provide 
the computational complexity for Algorithm A-PM in the case when V/ is Lipschitz 
continuous with constant Lf > 0. The complexity results for the case when / is simple 
can be derived similarly. 


Theorem 4.3 Under the assumptions of Theorem \ fl\ let po,e > 0 and the sequence 
(u k )k >o be generated by Algorithm A-PM(po,e,s). For nondifferentiable penalty case 
assume p = f. After a total number of projections onto 1C x U given by: 


{ N out 7 4 LfD^A V 2 + 2p P0 ^y/->\\G\\Du 
N out ^ 4 LfDfr 'j X /2 + W( Po A*\\GW /2 Du 


for smooth penalty 
for nonsmooth penalty , 
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where N° ut = 


bg (^< 


for smooth penalty 
for nonsmooth penalty 


, the primal point u k satisfies pri¬ 


mal suboptimality f(u k ) — f* < e and primal infeasibility distjc{Gu k + g) <e. 

Proof. The proof follows similar lines as in Theorem 13.91 It can be easily seen that, 
independently of the assumptions on the objective function /, Algorithm A-PM requires: 


N ont 


log 

log 



for smooth penalty 
for nonsmooth penalty 


outer steps to attain an e—optimal point. Taking into account that in the nonsmooth 
case, we apply the classical smoothing strategy from Section l4~2l the iteration complexity 
for solving the inner subproblem, at outer iteration k , can be bounded by: 


K n k = 


2 LfDf r 


V 2 1/2 
+ Pk 


2||G|| Du 

eU2 



gllGlfD u 

(/A > 1/2 


for smooth penalty 
for nonsmooth penalty, 


where p > 0 is the smoothing parameter. Knowing the maximal number of outer stages, 
note that the total number of the fast gradient iterations can be computed by summation 

Arout 

JV e i N out 

of all quantities N™ k . Observing that Pk — 6po2 " 2 , then we obtain the following 

k=o 

bound on the overall complexity: 


7V out 

E N 'i s 


k =0 



24( Po A*) 1 / 2 ||G||r> u 

f 3/2 

30(p o A*) 1/2 ||G||Z)[/ 

e 3/2 


+ 1 , 
+ 1 , 


for smooth penalty 
for nonsmooth penalty, 


which proves the statements of the theorem. ■ 

Remark 5 If we assume that there exist a bound R p such that ||u° — u*\\ < R p < oo 
for all u* € argmin (or cf p {u)), then we can remove the boundedness assumption 

on U (i.e. Assumption 2.2 ( ii )) and all the previous complexity results hold by replacing 
Du with R p . ■ 

In conclusion, if we do not assume the existence of an optimal Lagrange multiplier that 
closes the duality gap for the cone constrained convex problem ([T|), the computational 
complexity of fast gradient penalty methods, in the worst-case, is of order 0 (^ 72 ). More¬ 
over, these bound are tight as Example [T] shows. 


5. Comparisons with previous work 

We now present a brief comparison of our computational complexity results on La- 
grangian and penalty methods with previous complexity results from the literature in 
various optimality criteria. We start comparing the computational complexity results 
on (fast) gradient augmented Lagrangian methods in the optimality criteria used in 
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this paper: \f(u e ) — f*\ < e and distjc(Gw e + g) < e. Various first order augmented 
Lagrangian methods have been developed in e.g. 0,0 and computational complexity 
estimates of order O (1) have been obtained in our criteria. For example, in 0 ], an adap¬ 
tive augmented Lagrangian method for cone constrained convex optimization models was 
analyzed. The authors in 0 ] prove that the outer complexity is of order 0(log(l/e)) and 

the inner accuracy is constrained to be of order <5*. = O , where (3 > 1, and thus 

the overall complexity is similar to the estimates given in our paper (up to a logarithmic 
factor). However, our augmented Lagrangian algorithms can be easily implemented in 
practice, their parameters are easy to compute and our analysis based on the inexact 
oracle framework is more simple and intuitive in comparison with those given in 0 , 0 ] , 
opening various possibilities for extensions to more complex optimization models. 

On the other hand, Lan et. al. in 0 considered the linear equality constrained case (i.e. 
JC = {0}) and used another set of e—optimality criteria, i.e. any u e G U is e—optimal if 
there exists x e € M m satisfying: 

V/(« e ) + G T x e € — Mu(u t ) + B e {0) and \\Gu e + g\\ < e. (36) 


In these criteria, without any regularization of the original problem, the gradient aug¬ 
mented Lagrangian algorithm I-AL introduced in 0 has computational complexity of 

order O (e - * ). We further show that, using our approach we obtain a suboptimal point 


satisfying (1361) . with a much better iteration complexity for the same algorithm I-AL. 


More precisely, with our analysis, 
0 p| should perform: N°'^ = ^ 16firf 


3 lie 


’heorern 13.41 leads to the fact that I-AL method from 
outer iterations with inner accuracy <5 = There¬ 
fore, denoting the inner complexity IV™ < ) D V ^ the fi rs t s t a ge Q f I-AL 

method of 0 requires N^NV 1 projections onto U and, on the other hand, the Post¬ 
processing procedure in I-AL of 0 (| performs 2 ' ( L f+v\\G\\ )D L , p ro j e ctions onto U. 
Using these bounds, for any [m > the total number of projections required by the 

I-AL method in 03] is bounded with our analysis by: -— ^j^® uRd + 2 7 ^11^11 Du , jr Q r an 
optimal complexity, we choose the smoothing parameter as // = n + ttW ith 


HGIp/2,1/2 

this choice, the I-AL method from [Lid] performs with our analysis: 


l|G || 2 


o 


IllGW^R^Du 


f 3/2 


+ o 


L f D u 


projections onto U, for attaining an e-optimal point w.r.t. optimality criteria (1361) . 
Moreover, using a straightforward modification of the first stage of the I-AL method by 
replacing the outer dual gradient method with an outer dual fast gradient method, we can 
obtain a fast I-AL method. From Theorem 13.71 we have that fast I-AL method performs: 


= 


8i?d 

[le 


outer iterations with inner accuracy 5 = to attain an e—optimal 

point satisfying (1361) . Using the same reasoning as in the previous case, the first stage 
of fast I-AL method requires N^Nf 1 projections onto U and the Postprocessing 


procedure performs 


2 5 ^(L f +^\\G\\ 2 )Du 


projections onto U. Using these bounds, for any 


M > tt Twj, the total number of projections required by the fast I-AL method is bounded 
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i.„. 2 Wtf^WGWDu i’/^\\G\\*Du 

oy- - j; 1727372 -1 i - 

smoothing parameter as n = - 
performs with our analysis: 


■. In order to attain the optimal complexity, we choose the 
i/ 3 ||g|| 2/3 + With this choice, the fast I-AL method 


projections onto U. In conclusion, based on our settings we obtain computational com¬ 
plexities of order 0(e~ 2 ) for the original I-AL method and of order 0(e~3~j for the fast 
I-AL method, which are significantly better than the estimate 0(e~ 4) given in 0 for 
optimality criteria (1361) . Moreover, in our optimality criteria defined in Section 2, we 
have seen that for an optimal smoothing parameter both classical and fast augmented 
Lagrangian methods have the same complexity, while in optimality criteria (1361) the fast 
I-AL has the best overall complexity. Finally, we can combine our approach with a 
regularization technique, i.e. the addition of a strongly convex term ^||u — u°\\ 2 to the 
objective function, used e.g. in 0, and obtain also computational complexity (for the 
last primal point) of order 0(e _1 ) in optimality criteria (1361) . Due to space limitations 
we omit these derivations. 

Outer complexity estimate of order O(j) for fast gradient Nesterov type smoothing 
methods were derived e.g. in { 2 , 0, 10 • From our previous analysis we can conclude that 
for an adequate choice of the parameter /i, the number of outer iterations is only one, 
and therefore, the outer complexity estimates are irrelevant to the total complexity of 
the method. Thus, we need to derive overall complexities as we do in this paper. 

Finally, there are very few iteration complexity results for first order methods for convex 
problems that might not have a Lagrange multiplier closing the duality gap. Recently, 
Nesterov has proposed a specialized subgradient method for solving directly general non¬ 
smooth convex problems with functional constraints without assuming the existence of 
bounded optimal Lagrange multipliers [22]. The specialized subgradient method in (0 
requires 0{-^) total subgradient computations for either the objective function or for 
a functional constraint. In [ 9 ] the classical quadratic penalty scheme is combined with 
Nesterov optimal method for solving a general conic problem, but under the strong as¬ 
sumption of the existence of optimal Lagrange multipliers. If the objective function is 
smooth, then the quadratic penalty method requires 0(4) projections on the simple 
convex set and on the cone to attain an e-solution satisfying a criterion given in terms 
of a set of KKT conditions. On the other hand, using a regularization strategy for the 
original problem, the quadratic penalty method requires O(Mog^) projections to at¬ 
tain e-solution for the same criterion. Therefore, the assumption on the existence of an 
optimal Lagrange multiplier improves the iteration complexity of a quadratic penalty 
method from O(-^j) (see Section 4) to O(j) total iterations. Moreover, for this par¬ 
ticular setting, one can guarantee that the suboptimality estimates hold in both sides 
with arbitrary accuracy, compared with our setting where only the right hand side can 
be attained arbitrarily small. In conclusion, the price we pay for tackling a more gen¬ 
eral conic convex problem is the additional computational effort and the fact that the 
function value represents a lower approximation of the optimal value. 


6. Appendix 

In this section we provide proofs for Theorems 13.41 and 13.71 
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Appendix A.l 

Proof of Theorem \3.f\ We derive the sublinear estimates for primal infeasibility and pri- 

k 

mal suboptimality for the average primal point u k = \ where uf = rq^x- 7 ) and x - 7 

3 = 1 

is generated by Algorithm ICFG(dj) s , 0,35, 2L d ) with Q k = 1 for all k > 0. First, given 
the definition of x J+1 in Algorithm ICFG we get: 

x j+1 = x j + -^-V x £ ag (u j ,x j ) Vj > 0. 

Subtracting x - 7 from both sides, adding up these inequalities for j = 0 : k — 1, we get: 


3=0 


2L d i.fc o 
—— x —x 
k 


Note that W x £^ (it 7 . x- 7 ) = Gu? + g — 

.1 fc-i 

Grt - 7 + <? + 77 x - 7 , then | ^ £ 1C. This fact implies: 


Gvd + g + -yx - 7 . Using notation z - 7 = 


J K 




3=0 


dist jc(Gu k + g) < 


^ k—1 ^ k— 1 

- + 5 ) 


2L, 


k 


d "x fc -x°| 


(37) 


i=o j=o 

It remains to bound ||x fc — x°||. Using the iteration of ICFG, for x G /C*, we get: 
||x fc+1 - x || 2 = ||x fc - x || 2 + 2(x k+1 - x fc ,x fc+1 -x) - ||x fc+1 - x fc || 2 


1 


= \\x k - x || 2 + — (V x £ as (u k ,x k ),x k - x) 
L d 


+ ({V x £ as {u k ,x k ),x k+1 - x k ) - L d ||x fc+1 - x*|| 2 ) (38) 

< ||x fc - x || 2 + -^-(< g (x fc+1 ) - d as (x)) + |^ VA: > 0. 

Ld L d 

Taking x = x* in the last inequality and using an inductive argument, then we get: 


\\x k — x°|| < ||x fc — x*|| + ||x° — x*|| < 2 ||x° — x*|| + 

We substitute this bound into (|37l) and we get the estimate on primal infeasibility: 


dist/c (Gu k + g) < 


4 L d R d 2 L d / 3 kS 4 L d R d 


k 


+ 


k V L d 


k 


+ 


12 L d 5 


(39) 


It remains to derive the estimates on primal sub optimality. First, we observe that for 
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any u € U, we have < f* and the following identity holds: 

C^(u,x) - (V x jr^(u,x),x) = f(u) + ^\\S7 x Cf{u,x)\\ 2 . (40) 

Based on the previous discussion, from fl38|) and (l4U|) we derive that: 

||x fc+1 - x|| 2 <||x fc - x\\ 2 + -^~ ( d f j,(x k+1 )-C fJ ,(u k ,x k ) + (V x C fl (u k ,x k ),x k - x)+35) 

< 11^' - ^ll 2 + j - (/* - f(u k ) - |||V^ g (u,x)|] 2 + 35 - (V x C^{u k ,x k ),x)) . 
Taking now x = 0, and using an inductive argument over j = 0 : k — 1, we obtain: 

f(u k )-r< Ldll f l |2 + 3<5. (41) 


On the other hand, to bound below f(u k ) — f* we proceed as follows: 


f* = min f(u) + ( x*,Gu + g — r) < f(u k ) + ( x*,Gu k + g — Gu k + g ) 
u£U,r£lC 1 J 1C 

<f{u k ) + \\x*\\\\Gu k +g- Gu k +g J = f(u k ) + \\x*\\dist K (Gu k + g y (42) 


Combining (|39l) with (|42D and then with (1411) . we obtain the estimate on primal subop- 
timality stated in the theorem. ■ 


Appendix A.2 

Proof of Theorem m We derive sublinear estimates for primal infeasibility and subopti- 

fc-i 

rnality of the average primal point u k = i 6ju\ where = uGxJ) and x J " generated 

k 3=0 

by Algorithm ICFG(dj) g , 0,35, 2L d ) with 9 k+ \ = 1 + v / ^ +46> i f or a p k > 1. We observe 

that: ^4 < 0k < k and S? = 0'i _,. We denote l k = x k ~ 2 + 9 k (x k — x k ~ l ) and recall that 

the following relation has been proved in [131. 126l|: 

k— 1 k— 1 

^(^(x)-^(x fc ))+^0 J A(x,y i )+L d ||Z fc -x|| 2 <L d ||x°-x|| 2 +3^0 2 5, (43) 

i=1 i=l 

where A (x,y) = C^ g (u,j,(y),y) + (V z £)) g (ri M (y), y), x - y) - (f^{x). Now we are ready to 
prove Theorem 13.71 From definition of augmented dual function d^ g , it can be seen that 
x k = y k + -ij^S7 x C*ff(u k ,y k ). Multiplying by 9k, we obtain: 

^-V x C^(u k ,y k ) = 9 k (x k - y k ) = 9 k (x k - x^ 1 ) + (<?*_, - l)(x k ~ 2 - x^ 1 ) 

= x k ~ l + 9 k (x k - X fc_1 ) - ( x k ~ 2 + 9 k _ 1 (x k ~ 1 - x k ~ 2 )). (44) 

'--- / '--V- 

l k ik -1 
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Summing on the history of l k and multiplying by ^r, we obtain: 


distjc ( 


Gu k + g) < 


9 ,■ 


i=o ** 


2 L d 

s ? 1 


-ril < 




- 1 ° 


Since x* = arg max d£ g {x), by taking x = x* in (BSD, we get: 


* 11 / 
— X < 


\ 


fc -1 


|x° - - x*\\ 2 


3 9 2 5 


2,5 


+ Y^ —r^<\\x° — x*\\ + \—S? max 9, 


2—1 


Ld l<i<fc-l 

< || x 0 _ x *|| + 1) 3/2 f 

V -O d 


for all k >0. Thus, we can further bound the primal feasibility as follows: 


dist/c (Gu k + g) < 8L ^ d + ( 45 ) 

Further, we derive sublinear estimates for primal suboptimality. First, note that: 

A (x,y k ) = C ag (u k ,y k ) + (V x C^(u k ,y k ),x - y k ) - df{x) 

> f(u k ) + (V x £^(u k ,y k ),x) - d°*(x). 

Summing on the history and using the convexity of /, we get: 


fc -1 


fc -1 


5 >A0z )2 /) >£>(/(«*) + - df{x)) 


> 9 


2 

k 



(46) 


for all x € M m . Using (1451) in (j45)l . and dropping the term L d ||/ fc — x|| 2 , we have: 


fc-i 


, _ Q. Sii+igsJ T, 

/(* ) + E^(W£?K,»’),x)-<i«(x) < 

i =1 




fc -1 

3 E9f 

|x° - x\\ 2 H ——§ 
e k -1 


for all x € M m . Given that tjt— V) Of = i 

2 — 1 

by choosing the Lagrange multiplier x = 0, 


fc-i 

V 0 ;} < max 0* < k 
jtTi l<i<fc-l 

we further have: 


1 and d)) g (x) < /*, 


f{u k )-r<f{n k )-df{ 0 )< 


4L d ||x"l| 2 

k 2 


+ 3k5. 


(47) 
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On the other hand, we have: 


r 


min f(u) + (x* ,Gu + g 
u£U,s£K 


s) < f{u k ) + (x*,Gu k + g 


B3 

< f(x k ) + 


8L d R d 
k 2 


+ 8 i?d 



Gu k + g 



(48) 


Finally, from ((151) . (PIT)) and (1151) we get the estimates on primal infeasibility and subop¬ 
timality stated in the theorem. ■ 


Appendix A.3 

of Theorem \3.RA First, note that an analog result as in the previous Appendix holds in 
this case, and for clarity we state it below (see e.g. [2(J for a proof): 

Lemma 6.1 Let /j.,5 > 0 and sequences ( x k ,y k )k>o be generated by Algorithm 

ICFG(du t n,dfc,5) with 9k+i = 1+ ^> +4e - for all k > 1, then for any Lagrange mul¬ 
tiplier x and iteration k we have: 


k -1 


0l(dn{x) - d^{x k )) + Y Qi A ( x > y l ) + L d\\l k - x\ 


2 < L d \\x° — z|| 2 


+ 3^^, (49) 


i =1 


i=l 


where we use A (x,y) = C^(u^,(y),y) + (V x £ m (m m (?/), y), x - y) - d^x). 

Based on the same notations and reasoning as in Appendix A.2, taking x = x* in (1491) 

k -1 

and using that terms #*.(/* — d fl (x k )) and ^ 0jA(x*,y l ) are positive, we obtain: 

i =1 


distjc ( Gu k + g^j < 


8 L d 

k 2 1 




8L d R d 3L d 5 

S ^“ + 8 V~- 

Further, we derive sublinear estimates for primal suboptimality. First, note that: 

A(x, y k ) = Cf,(u k ,y k ) + (\/ x C fl (u k ,y k ),x - y k ) - d^x) 

= C^{u k ,y k ) + (Gu k + g,x - y k ) - d^x) = C^(u k ,x) - d^x). 


(50) 


Summing on the history and using the convexity of £ At (-,x), we get: 


k -1 


fc -1 


^0jA(x,y l ) = ^0j(£ M («*,z) ~ d u( x )) 
i=l i =1 

>4 [£ ti (u k ,x)-d,(x)) =Q\ 



x) - d^x)) . 


(51) 
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k-1 

£ 

Using (I5T1) in (T49l) . * = U < max < k — 1 and dropping term kf-\\l k — x|| 2 , we have: 


'A: a (J*\ s' ,, o 


x) — d fl (x k ) < — 2 -||x u — x|| 2 + 3M. 

e k 


(52) 


Choosing the multiplier x = 0, we observe that C^{u k , 0) > /(ft fc ) and d tl (x) < f* + 
for all x € —1C*. Then, combining this observations with (1521) leads to: 


/(«*)-/* </(£*)< 


122) 4Lril|x 0 || 2 


-.fc\ r* ^ jv~fc\ j /_fc\ “ ‘ ± -^d|| x "|| , ^ n 2 , ^ 4||G|| 2 iZ^ fj, 2 


fc 2 


+ — Dy + 3 A: (5 < 


fik 2 +2 Du + 3kd - 


We choose the optimal smoothing parameter by minimizing the above expression over ft 
and obtain: fi(k) = —• Replacing this value in the above estimates, we obtain: 

/^)-r< 23/2||G "^+3M. 


IIGII 2 

Also, taking L d = i n the feasibility gap (1501) . we get the estimate on infeasibility: 

1 /9 

dist*; (Gil fc + g) < 2 ‘ + 2 ^ ^ G j^ >t/<5 ^ • On the other hand, we have: 


f* = min f(u) + (x*,Gu + g — s) < f(u k ) + (x*, Gu k + g — 

u£U,s£K. 


Gu k + g ) 


J K. 


< 


f(u k ) + nG ^ uRd + 2 (5\\G\\DuR d ) 1 ' 2 , 


which proves the statements of the theorem. 
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